123 research outputs found
The combined effect of foreign direct investment on firm productivity
This paper attempts to answer the economic implications of combining
inward foreign direct investment (IFDI) and outward foreign
direct investment (OFDI) by constructing a panel fixed
effects model using Chinese industrial firm-level data for the
period 1998–2013. Specifically, we focus on the impact of combining
IFDI and OFDI on firm productivity in China. We also introduce
interactive terms into the model to explore the direct and
indirect mechanisms through which IFDI and OFDI affect productivity
growth. The results show that IFDI and OFDI work together
to contribute to productivity growth by acting directly on the
level of technology, thereby increasing productivity. IFDI intensifies
market concentration, which in turn positively moderates the
relationship between OFDI and productivity. Furthermore, IFDI
moderates the financing constraints of firms, but has a weaker
effect; the easing of financing constraints facilitates the positive
impact of OFDI on productivity. Absorptive capacity favours IFDI
spillover, but OFDI inhibits absorptive capacity improvements. Our
in-depth analysis of the mechanism of the combined impact of
IFDI and OFDI on productivity reveals the objectives of using this
combination, thereby providing theoretical support and policy
recommendations for the implementation of this strategy
Understanding Hidden Memories of Recurrent Neural Networks
Recurrent neural networks (RNNs) have been successfully applied to various
natural language processing (NLP) tasks and achieved better results than
conventional methods. However, the lack of understanding of the mechanisms
behind their effectiveness limits further improvements on their architectures.
In this paper, we present a visual analytics method for understanding and
comparing RNN models for NLP tasks. We propose a technique to explain the
function of individual hidden state units based on their expected response to
input texts. We then co-cluster hidden state units and words based on the
expected response and visualize co-clustering results as memory chips and word
clouds to provide more structured knowledge on RNNs' hidden states. We also
propose a glyph-based sequence visualization based on aggregate information to
analyze the behavior of an RNN's hidden state at the sentence-level. The
usability and effectiveness of our method are demonstrated through case studies
and reviews from domain experts.Comment: Published at IEEE Conference on Visual Analytics Science and
Technology (IEEE VAST 2017
Word-Graph2vec: An efficient word embedding approach on word co-occurrence graph using random walk sampling
Word embedding has become ubiquitous and is widely used in various text
mining and natural language processing (NLP) tasks, such as information
retrieval, semantic analysis, and machine translation, among many others.
Unfortunately, it is prohibitively expensive to train the word embedding in a
relatively large corpus. We propose a graph-based word embedding algorithm,
called Word-Graph2vec, which converts the large corpus into a word
co-occurrence graph, then takes the word sequence samples from this graph by
randomly traveling and trains the word embedding on this sampling corpus in the
end. We posit that because of the stable vocabulary, relative idioms, and fixed
expressions in English, the size and density of the word co-occurrence graph
change slightly with the increase in the training corpus. So that
Word-Graph2vec has stable runtime on the large scale data set, and its
performance advantage becomes more and more obvious with the growth of the
training corpus. Extensive experiments conducted on real-world datasets show
that the proposed algorithm outperforms traditional Skip-Gram by four-five
times in terms of efficiency, while the error generated by the random walk
sampling is small
LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models
Language model (LM) based audio generation frameworks, e.g., AudioLM, have
recently achieved new state-of-the-art performance in zero-shot audio
generation. In this paper, we explore the feasibility of LMs for zero-shot
voice conversion. An intuitive approach is to follow AudioLM - Tokenizing
speech into semantic and acoustic tokens respectively by HuBERT and
SoundStream, and converting source semantic tokens to target acoustic tokens
conditioned on acoustic tokens of the target speaker. However, such an approach
encounters several issues: 1) the linguistic content contained in semantic
tokens may get dispersed during multi-layer modeling while the lengthy speech
input in the voice conversion task makes contextual learning even harder; 2)
the semantic tokens still contain speaker-related information, which may be
leaked to the target speech, lowering the target speaker similarity; 3) the
generation diversity in the sampling of the LM can lead to unexpected outcomes
during inference, leading to unnatural pronunciation and speech quality
degradation. To mitigate these problems, we propose LM-VC, a two-stage language
modeling approach that generates coarse acoustic tokens for recovering the
source linguistic content and target speaker's timbre, and then reconstructs
the fine for acoustic details as converted speech. Specifically, to enhance
content preservation and facilitates better disentanglement, a masked prefix LM
with a mask prediction strategy is used for coarse acoustic modeling. This
model is encouraged to recover the masked content from the surrounding context
and generate target speech based on the target speaker's utterance and
corrupted semantic tokens. Besides, to further alleviate the sampling error in
the generation, an external LM, which employs window attention to capture the
local acoustic relations, is introduced to participate in the coarse acoustic
modeling
Radial Icicle Tree (RIT): Node Separation and Area Constancy
Icicles and sunbursts are two commonly-used visual representations of trees.
While icicle trees can map data values faithfully to rectangles of different
sizes, often some rectangles are too narrow to be noticed easily. When an
icicle tree is transformed into a sunburst tree, the width of each rectangle
becomes the length of an annular sector that is usually longer than the
original width. While sunburst trees alleviate the problem of narrow rectangles
in icicle trees, it no longer maintains the consistency of size encoding. At
different tree depths, nodes of the same data values are displayed in annular
sections of different sizes in a sunburst tree, though they are represented by
rectangles of the same size in an icicle tree. Furthermore, two nodes from
different subtrees could sometimes appear as a single node in both icicle trees
and sunburst trees. In this paper, we propose a new visual representation,
referred to as \emph{radial icicle tree} (RIT), which transforms the
rectangular bounding box of an icicle tree into a circle, circular sector, or
annular sector while introducing gaps between nodes and maintaining area
constancy for nodes of the same size. We applied the new visual design to
several datasets. Both the analytical design process and user-centered
evaluation have confirmed that this new design has improved the design of
icicles and sunburst trees without introducing any relative demerit
Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints
Conveying the linguistic content and maintaining the source speech's speaking
style, such as intonation and emotion, is essential in voice conversion (VC).
However, in a low-resource situation, where only limited utterances from the
target speaker are accessible, existing VC methods are hard to meet this
requirement and capture the target speaker's timber. In this work, a novel VC
model, referred to as MFC-StyleVC, is proposed for the low-resource VC task.
Specifically, speaker timbre constraint generated by clustering method is newly
proposed to guide target speaker timbre learning in different stages.
Meanwhile, to prevent over-fitting to the target speaker's limited data,
perceptual regularization constraints explicitly maintain model performance on
specific aspects, including speaking style, linguistic content, and speech
quality. Besides, a simulation mode is introduced to simulate the inference
process to alleviate the mismatch between training and inference. Extensive
experiments performed on highly expressive speech demonstrate the superiority
of the proposed method in low-resource VC.Comment: Accepted by ICASSP 202
- …